Create a Medical Image Annotation Job

Project Overview

Your goal, as a product owner is to build a product that helps doctors quickly identify cases of pneumonia in children. You'll want to build a classification system that

  • Can help flag serious cases
  • Quickly identify healthy cases
  • And, generally, act as a diagnostic aid for doctors

As such, this project is designed to test your ability to build a labeled dataset that distinguishes between healthy and pneumonia x-ray images; this can be used by ML engineers later on down the line to build a classification product. Your main task will be to create a data labeling job using Appen's platform. If you have not yet, you'll need to create an Appen account and login to complete this project.

You will not be required to actually launch the data labeling job, you'll be evaluated on submitted documents and design only.

Project Submission

Your project will be graded against this project rubric. Each item on the rubric will be graded by a reviewer and marked as Passed or Needs Changes. You may resubmit your project if it does not pass, but it is suggested that you try to stick to the suggested deadline for project completion.

It is recommended that you use an appropriate Appen template as a starting point for this image classification project. Using the platform, you should upload the provided xray_image_data.csv file and create a job to add labels to all of the images in that dataset.

Project Files

If you have not yet downloaded the starting project files, you can do so at this link.

You should select a good starting template, upload the data, and then change the instructions, examples, and test questions to customize the job to your specific task. To complete and pass the project, you will have to submit a single zip file that contains two documents:

  1. An HTML Instructions_Preview.html file that includes your instructions, examples, and some sample test questions (as can be seen when you save a job and Preview the result.
  2. A pdf Proposal file, which is a writeup that details your design considerations and strategies for quality assurance.

You will not need to launch the data labeling job, only submit the required files. Detailed instructions for what is required to complete each document can be found in the project rubric.


Other Considerations

You should design a data annotation job, such that a non-expert can identify more noticeable cases of pneumonia. Since you are designing for a non-expert annotator, you should design for failure; this means including some way to capture uncertainty in your data labels and test questions.

So, what indicates pneumonia and what kind of advice and examples can we give potential annotators?

There are a few different visual symptoms that indicate pneumonia. The most important areas to have annotators pay attention to are the lungs and the diaphragm.

  • A normal, healthy image will depict clear lungs without any areas of abnormal cloudiness/opacity; there may be structured, web-like vasculature in the lungs but otherwise that area should be clear. In healthy images, you are also more likely to see a diaphragm shadow.
  • A pneumonia image may include a few things: areas of cloudiness/opacity in several concentrated areas or one large area. You may also see a general pattern of opacity that obscures the structure of the lungs, heart, and diaphragm.

Designing a Data Labeling Job

This is a very challenging classification task and so you should provide clear examples and instructions to potential annotators.

  • You may choose to have annotators try to label an image as pneumonia or not (binary classification); if this is the case, you should include an Unknown or Other option to account for uncertainty in an annotation.
  • You may also choose to have annotators describe how likely they think a case of pneumonia is in a given image, and you could measure this on a numerical scale; 0-n for their confidence that an image contains pneumonia symptoms or not. A scale like this automatically includes room for low-confidence and uncertainty.
  • In your Proposal document, you will discuss your design choices and methods for quality assurance.

It is suggested that you start with an Appen job-template, and customize it to this particular task.